Computational intelligence approach for gene expression data mining and classification
نویسندگان
چکیده
The exploration of high dimensional gene expression microarray data demands powerful analytical tools. Our data mining software, VISual Data Analyzer (VISDA) for cluster discovery, reveals many distinguishing patterns among gene expression profiles. The model-supported hierarchical data exploration tool has two complementary schemes: discriminatory dimensionality reduction for structure-focused data visualization, and cluster decomposition by probabilistic clustering. Reducing dimensionality generates the visualization of the complete data set at the top level. This data set is then partitioned into subclusters that can consequently be visualized at lower levels and if necessary partitioned again. These approaches produce different visualizations that are compared against known phenotypes from the microarray experiments. For class prediction on cancers using miroarray data, Multilayer Perceptrons (MLPs) are trained and optimized, whose architecture and parameters are regularized and initialized by weighted Fisher Criterion (wFC)-based Discriminatory Component Analysis (DCA). The prediction performance is compared and evaluated via multifold cross-validation.
منابع مشابه
A New Knowledge-Based System for Diagnosis of Breast Cancer by a combination of the Affinity Propagation and Firefly Algorithms
Breast cancer has become a widespread disease around the world in young women. Expert systems, developed by data mining techniques, are valuable tools in diagnosis of breast cancer and can help physicians for decision making process. This paper presents a new hybrid data mining approach to classify two groups of breast cancer patients (malignant and benign). The proposed approach, AP-AMBFA, con...
متن کاملUsing Combined Descriptive and Predictive Methods of Data Mining for Coronary Artery Disease Prediction: a Case Study Approach
Heart disease is one of the major causes of morbidity in the world. Currently, large proportions of healthcare data are not processed properly, thus, failing to be effectively used for decision making purposes. The risk of heart disease may be predicted via investigation of heart disease risk factors coupled with data mining knowledge. This paper presents a model developed using combined descri...
متن کاملCredit scoring in banks and financial institutions via data mining techniques: A literature review
This paper presents a comprehensive review of the works done, during the 2000–2012, in the application of data mining techniques in Credit scoring. Yet there isn’t any literature in the field of data mining applications in credit scoring. Using a novel research approach, this paper investigates academic and systematic literature review and includes all of the journals in the Science direct onli...
متن کاملEnsemble Classification and Extended Feature Selection for Credit Card Fraud Detection
Due to the rise of technology, the possibility of fraud in different areas such as banking has been increased. Credit card fraud is a crucial problem in banking and its danger is over increasing. This paper proposes an advanced data mining method, considering both feature selection and decision cost for accuracy enhancement of credit card fraud detection. After selecting the best and most effec...
متن کاملHybrid particle swarm optimization and tabu search approach for selecting genes for tumor classification using gene expression data
Gene expression data are characterized by thousands even tens of thousands of measured genes on only a few tissue samples. This can lead either to possible overfitting and dimensional curse or even to a complete failure in analysis of microarray data. Gene selection is an important component for gene expression-based tumor classification systems. In this paper, we develop a hybrid particle swar...
متن کامل